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Abstract 



We propose a Bayesian expectation-maximization (EM) algorithm for reconstructing structured approximately 
sparse signals via belief propagation. The measurements follow an underdetermined linear model where the 
regression-coefficient vector is the sum of an unknown approximately sparse signal and a zero-mean white Gaussian 
', noise with an unknown variance. The signal is composed of large- and small-magnitude components identified by 

binary state variables whose probabilistic dependence structure is described by a hidden Markov tree. Gaussian 
priors are assigned to the signal coefficients given their state variables and the Jeffreys' noninformative prior is 
^ ^ ■ assigned to the noise variance. Our signal reconstruction scheme is based on an EM iteration that aims at maximizing 

the posterior distribution of the signal and its state variables given the noise variance. We employ a max-product 
algorithm to implement the maximization (M) step of our EM iteration. The proposed EM algorithm estimates the 
vector of state variables as well as solves iteratively a linear system of equations to obtain the corresponding signal 
estimate. We select the noise variance so that the corresponding estimated signal and state variables (obtained upon 
convergence of the EM iteration) have the largest marginal posterior distribution. Our numerical examples show 
that the proposed algorithm achieves better reconstruction performance compared with the state-of-the-art methods. 

>: 

, Index Terms 



Belief propagation, expectation maximization (EM) algorithm, hidden Markov tree (HMT), max-product algo- 
rithm, structured sparsity, sparse signal reconstruction. 

I. INTRODUCTION 

The advent of compressive sampling (compressed sensing) in the past few years has sparked research 



^ 1 activity in sparse signal reconstruction, whose main goal is to estimate the sparsest pxl signal coefficient 
vector s from the x 1 measurement vector y satisfying the following underdetermined system of linear 
equations: y = H s, where H is an N x p sensing matrix and N < p. 

A tree dependency structure is exhibited by the wavelet coefficients of many natural images [1]- 
[5], see also Fig. 1(a) and [2, Fig. 2]. A probabilistic Markov tree structure has been employed to 
model the statistical dependency between the state variables of wavelet coefficients [1]. An approximate 
belief propagation algorithm has been first applied to compressive sampling in [6], which employs 
sparse Rademacher sensing matrices for Bayesian signal reconstruction. Donoho et al. [7] simplified the 
sum-product algorithm by approximating messages with using a Gaussian distribution specified by two 

The material in this manuscript was presented in part at the SPIE Optics+Photonics Symposium, San Diego, CA, August 2012. 
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scalar parameters, leading to their approximate message passing (AMP) algorithm. Following the AMP 
framework, [8] proposed a turbo-AMP structured sparse signal recovery method based on loopy belief 
propagation and turbo equalization and applied it to reconstruct one-dimensional signals; [5] applied the 
turbo-AMP approach to reconstruct compressible images. However, the above references do not employ 
the exact form of the messages and also have the following limitations: Baron et al. [6] rely on sparsity 
of the sensing matrix, the methods by Baron et al. [6] and Donoho et al [7] apply to unstructured signals 
only, and the turbo-AMP approach in [5] and [8] needs columns of the sensing matrix to be normalized, 
see [5, eq. (22)] and [8, Sec. IV. A]. 

In this paper, we combine the hierarchical measurement model in [9] with a Markov tree prior on 
the binary state variables that identify the large- and small-magnitude signal coefficients and develop 
a Bayesian maximum a posteriori (MAP) expectation-maximization (EM) signal reconstruction scheme 
that aims at maximizing the posterior distribution of the signal and its state variables given the noise 
variance, where the maximization (M) step employs a max-product belief propagation algorithm. Unlike 
the previous work, we do not approximate the message form in our belief propagation scheme. Unlike the 
turbo-AMP scheme in [5] and [8], our reconstruction scheme does not require the columns of the sensing 
matrix to be normalized. Since there are no loops in the graphical model behind our M-step objective 
function, the M step of our EM algorithm is exact. In [10], we proposed a similar EM algorithm for a 
random signal model [11] with a purely sparse deterministic signal component and a noninformative prior 
on this component given the binary state variables. We apply a grid search to select the noise variance so 
that the estimated signal and state variables have the largest marginal posterior distribution. 

In Section II, we introduce our measurement and prior models. Section III describes the proposed EM 
algorithm, where the M step implementation via the max-product algorithm is presented in Section III-A. 
The selection of the noise variance parameter is discussed in Section IV. Numerical simulations in 
Section V compare reconstruction performances of the proposed and existing methods. 

We introduce the notation: /„ and 0„xi denote the identity matrix of size n and the n x 1 vector 
of zeros, respectively; and || • ||p are the transpose and Ip norm, respectively; M{x; fj,, S) denotes 
the probability distribution function (pdf) of a multivariate Gaussian random vector x with mean /x and 
covariance matrix E; Inv-x^(o"^; i^,(Tq) denotes the pdf of a scaled inverse chi-square distribution with 
p degrees of freedom and a scale parameter ctq, see [12, p. 50 and App. A]; |T| is the cardinality of 
the set T; v{-) is an invertible operator that transforms the two-dimensional matrix element indices into 
one-dimensional vector element indices. Finally, pn denotes the largest singular value of a matrix H, also 
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known as the spectral norm of H, and "0" denotes the Hadamard (elementwise) product. 

II. Measurement and Prior Models 

We model m. N xl real-valued measurement vector y using the standard additive white Gaussian noise 
measurement model with the likelihood function given by the following pdf [2], [5]: 

Py\s,Ay\s,(y'^)^M{y]Hs,a'^Ip) (1) 

where H is an A?^ x p real-valued sensing matrix with rank(i7) — N satisfying (without loss of generality) 

Ph = 1 (2) 

s — [si, S2, . . . , Sp]^ is an unknown p x 1 real-valued signal coefficient vector, and is the unknown 

noise variance. 

We adopt the Jeffreys' noninformative prior for the variance component cr^: 

pA<y^)^{a^r\ (3) 

Define the vector of binary state variables q — ?2, ■ ■ ■ , £ {0, Ip that determine if the 
magnitudes of the signal components Sj, i = 1,2, ... ,p are small (gj — 0) or large {qi — 1). Assume 
that Si are conditionally independent given and assign the following prior pdf to the signal coefficients: 

PsW,As\q. 'y') -i[W{si\ 0,7V')]*[A^(«.; 0,eV^)]^-« (4a) 

i=l 

where 7^ and are known positive constants and, typically, 7^ ^ e^. Hence, the large- and small- 
magnitude signal coefficients Sj corresponding to = 1 and gj = are modeled as zero-mean Gaussian 
random variables with variances 7^ and u^, respectively. Consequently, 7^ and are relative variances 
(to the noise variance a^) of the large- and small-magnitude signal coefficients. Equivalently, 

Ps\qM^\q: (7^) ^M{s - Op^i,a^D{q)) (4b) 

where 

D{q) = diag{(7')«^ (e^)^-^\ (f {e^-'^^ . . . , {^'Y" (e^)^"^^}. (4c) 

We now introduce the Markov tree prior probability mass function (pmf) on the state variables q^ 
[1], [5]. To make this probability model easier to understand, we introduce two-dimensional signal 
element indices (ii, {2). Recall that the conversion operator v{-) is invertible; hence, there is a one-to-one 

correspondence between the corresponding one- and two-dimensional signal element indices. A parent 



(a) 



(b) 



Fig. 1. (a) Clustering of significant discrete wavelet transform coefficients of a compressed 'Cameraman' image and (b) types of wavelet 
decomposition coefficients: approximation, root, and leaf, whose sets are denoted by .4, T^oot, and TTcaf, respectively. 



wavelet coefficient with a two-dimensional position index {ii,i2) has four children in the finer wavelet 
decomposition level with two-dimensional indices {2ii — 1,2^2 — 1), (2ii — 1,222), (2^1,2^2 — 1) and 
(2ii, 2^2), see Fig. 1(b). The parent-child dependency assumption implies that, if a parent coefficient in a 
certain wavelet decomposition level has small (large) magnitude, then its children coefficients in the next 
finer wavelet decomposition level tend to have small (large) magnitude as well. Denote by p and c the 
numbers of rows and columns of the image, and by L the number of wavelet decomposition levels (tree 
depth). 

We set the prior pmf Pq{q) as follows. In the first wavelet decomposition level (/ = 1), assign 



are the sets of indices of the approximation and root node coefficients and Proot G (0, 1) is a known constant 
denoting the prior probability that a root node signal coefficient has large magnitude, see Fig. 1(b). In the 
levels / = 2, 3, . . . , L, assign 




(5a) 



where 



A = v{{l,2,...,^}x{l,2,. ..,-}) 
;oot=t;({l,2,...,^}x{l,2,...,^})\^ 



(5b) 



(5c) 




1 





(5d) 



where 7r{i) denotes the index of the parent of node i. Here, Ph e (0, 1) and Pl G (0, 1) are known 
constants denoting the probabilities that the signal coefficient Sj is large if the corresponding parent signal 
coefficient is large or small, respectively. 

Our wavelet tree structure consists of |7^oot| trees and spans all signal wavelet coefficients except the 
approximation coefficients; hence, the set of indices of the wavelet coefficients within the trees is 

r = v{{l,2,...,p}x{l,2,...,c})\A (5e) 
Define also the set of leaf variable node indices within the tree structure as 

7reaf = ^^([{l,2,...,p}x{l,2,...,c}]\[{l,2,...,^}x{l,2,...,|}]) (5f) 

see Fig. 1(b). More complex models are possible; see e.g., [3] and [5], which, however, need at least 10 

hyperparameters to specify the prior for the same wavelet tree and did not report large-scale examples. 

Here, we only need 5 tuning parameters Proot, ^h, Pl, 7^, and e^, each with a clear meaning. A fairly crude 

choice of these parameters is sufficient for achieving good reconstruction performance, see Section V. 
The logarithm of the prior pmf Pq{q) is 



Inpq(q) = const + [^lnl(gj = !)] + [ ^ Qi InP^oot + (1 - 9i) ln(l - P,oot) 
+ [ X] Qi Qn{i) In Ph + (1 - Qi) q^(i) ln(l - Ph) 

ieT\7^oot 

+qi (1 - q^{i)) In Pl + (1 - Qi) (1 - q^ii)) ln(l - Pl)] (5g) 
where const denotes the terms that are not functions of q. 

A. Bayesian Inference 

Define the vectors of state variables and signal coefficients 

d^[dl el ■■■ , d, = [q,, s,]" . (6) 

The joint posterior distribution of and cr^ is 

Pe, I y (0, a'^\y) (xpy\ ^,^2 (y \s,a^)p^\ ^2 (s \q, a^) Pq{q) p^^ (a^) 

oc (a2)-(f+^+2)/2 exp[-0.5\\y - H s\\l/a^ - 0.5 s'^ D-\q) s/a^] (^f^y''^'^'"' p^{q) (7) 
which implies 

P0,a^\y(&, (^^\y) 



p + N 



P9\y{G\y) = 



e2 \ 0.5 ELi 9. / r lly - S||2 + P»-l(q) S-l (P+iV)/2 



p + N 



(8b) 
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and 



\y - Hs\\l + s'^ D-\q)si 



,2/) oc exp -0.5 ( — ) pq{q). (8c) 
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For a fixed q, (8b) is maximized with respect to s at 

s{q) = D{q) [I^ + H D{q) H^]-' y. (9) 

which is the Bayesian Unear-model minimum mean-square error (MMSE) estimator of s for a given q 
[13, Theorem 11.1]. As decreases to zero, s{q) becomes more sparse (becoming exactly sparse for 
= 0); as increases, s{q) becomes less sparse. 

Substituting (9) into (8b) yields the following concentrated (profile) marginal posterior. 

maxpeiy{e\y)c,p,{q)i^-) /| 1 (10) 

which is a function of the state variables q only. 

We wish to maximize (8b) with respect to 0, but cannot perform this task directly. Consequently, we 
adopt the following indirect approach: We first develop an EM algorithm for maximizing pq | a'^^yi^ I <^^; v) 
in (8c) for a given cr^ (Section III) and then propose a grid search scheme for selecting the best 
regularization parameter cr^ so that the estimated signal and state variables have the largest marginal 
posterior distribution (8b) (Section IV). 

111. An em Algorithm for Maximizing pe | a^.y{0 \ o-^, y) 
Motivated by [9, Sec. V.A], we introduce the following hierarchical two-stage model: 

Py I {y\z,a')=Af{y; Hz,a^ (I^ - H H^)) (11a) 
Pz\s{z\s)=N{z- s^a'^Ip) (lib) 

where z is an p x 1 vector of missing data. Observe that the assumption (2) guarantees that the covariance 
matrix (/jy _ H H^) in (11a) is positive semidefinite. 

Our EM algorithm for maximizing p0\„2 y{d | cr^, y) in (8c) consists of iterating between the following 
expectation (E) and maximization (M) steps: 

E step: = \z^^ , zi\ z^^^Y = s^'^ + {y - H s^'^) (12) 

and 

Aj) -s\\l + s^D-\q)s , , ^^ , n.^^^^\\- 



M step: 9^+^^ = argmax | - 0.5^ / + ln[pq(q)] + 0.5 In (^^ j ^ g,| (13a) 

^ ^ i=i 

= argmax In Pel 0-2^2(0 | cr^, z^^^) (13b) 

where j denotes the iteration index. For any two consecutive iterations j and j + 1, our EM algorithm 
ensures that the objective posterior function does not decrease [14], i.e. 

P0W-,y{O^'^'^\<^^y) >PO\a^y{0^'^\(T',y). (14) 



To simplify the notation, we omit the dependence of the iterates 6^^'' on cr^ in this section. Denote by 
gC+oo) ^j^^ q{+oo) |.]^g estimates of 6, s, and q obtained upon convergence of the above EM iteration. 
The above EM iteration provides an estimate of the vector of state variables q as well as finds 

the solution (9) of the underlying linear system to obtain the corresponding signal estimate: 

s(+~) = (15) 

As decreases to zero, s(+°°) becomes more sparse; as increases, s(+°°) becomes less sparse. 

Appendix A presents the derivation of the E and M steps in (12) and (13) and the proofs of the 
monotonicity property (14) and the property (15) of the signal estimate upon convergence. 

Note that the M step in (13b) is equivalent to maximizing P0\cr^,z{^ I cr^, z) for the missing data vector 
z — z'^^\ In the following section, we describe efficient maximization of P0\a^^z{^ I 

A. M Step: Maximizing pg \ o-2,z(0 | c"^, z) 
Before we proceed, define 

^^(°) = ^'^^^ " 1 + 72 ^^^^ 

where we omit the dependence of Sj(0) and on Zi to simplify the notation. 
Observe that 

Pe I (t2,z(0 I z) oc pe^ I a2,z(^.A I z) pg^ I a2,z(0r I z) (17) 
where 6^ and df consist of 0i,i E A and 6i,i eT, respectively, and 

Po^la^A^Al'J^z) (x\^l[X{zi;Si,a^)X{si; 0, 7^^) = 1)} (18a) 

Here, (18a) follows from (5a) and (18b) corresponds to the hidden Markov tree (HMT) probabilistic model 
that contains no loops. Fig. 2 depicts an HMT that is a part of the probabilistic model (18b). Maximizing 

Pej^\a^,z{0A I cr^; z^^^) in (18a) with respect to di, i e A yields 

di^ [1, (19) 

where we have used the identity (Bla) in Appendix B. 

We now apply the max-product belief propagation algorithm [15]-[17] to each tree in our wavelet tree 
structure, with the goal to find the mode of p0^|o^2 ^(^t- | cr^, z). We represent the HMT probabilistic 
model for pe^\„2^z(dj- \ cx^, z) via potential functions as [see (18b)] 



(20) 
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Fig. 2. A hidden Markov tree, part of the probabihstic model (18b). 
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M{zi-s,,a^) [Proo,M{s,- 0,7V2)]«« [(l-Proot)A/'(s.; 0,e2a2)]i-^- 



where 
and, for i e T\%oot, 



i G T\Troot /oi „\ 

t e X, ^^^^^ 



root 



(21b) 



Our algorithm for maximizing (20) consists of computing and passing upward and downward messages 
and calculating and maximizing beliefs. 

1) Computing and Passing Upward Messages: We propagate the upward messages from the lowest 
decomposition level (i.e., the leaves) towards the root of the tree. Fig. 3(a) depicts the computation of the 
upward message from variable node 6i to its parent node wherein we also define a child of 0i as a 
variable node Ok with index k E ch(z), where ch{i) is the index set of the children of i: for i = v{ii, ^2), 
ch(z) = {v[{2ii — 1, 2 Z2 — 1)5 (2 ii — 1, 2 ^2), (2 ii, 2 ^2 — 1), (2 ii, 2 i2)) }• Here, we use a circle and an 
edge with an arrow to denote a variable node and a message, respectively. The upward messages have 
the following general form [16]: 



"^i^7r(i)('?7r(i)) = «max|?/'i(0i)?/'i,^(i)(gi,g^(i)) JJ mfc^i(gi)| 

kech{i) 



(22) 
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(23b) 



I/" 



where a > denotes a normalizing constant used for computational stability [16]. For nodes that have 
no children (corresponding to the level L, i.e., i G 7ieaf)> we set the multiplicative term nfcGch(i) ^fc^-i(^i) 
in (22) to one. 

In Appendix B-I, we show that the only two candidates for 6^ in the maximization of (22) are [0, Si(0)]^ 
and [l,Si(l)f, see (16). 

Substituting these candidates into (22) and normalizing the messages yields (see Appendix B-I) 

m,_.(,)(g.(,)) = k"(0)]^-''^« (23a) 

where K(0),/x^(l)f = M-, 

~ max{i/S,^ + max{i/?_. t]^} 
^ [exp(lnmax{i/g^ 77^} - \nmax{ul, 77,"}), l]"^ 
1 + exp(lnmax{i>'Q_j 77^} - lnmax{i/",j 77^^}) 

[1-Pl, PLfQ<t>(zi) (23c) 
[1-Ph, Ph]^0</>N (23d) 

[ [1, IJ , ^ e Tieaf 

(/>(;2)= [exp(-0.5^^)/6, exp(-0.5 ^^)/7]^ (23f) 

and e = > and 7 = ^/t^ > 0. A numerically stable implementation of (23b) that we employ 

is illustrated in the second expression in (23b). Similarly, the elementwise products in (23c)-(23e) are 

implemented as exponentiated sums of logarithms of the product terms. 

2) Computing and Passing Downward Messages: Upon obtaining all the upward messages, we now 
compute the downward messages and propagate them from the root towards the lowest level (i.e., the 
leaves). Fig. 3(b) depicts the computation of the downward message from the parent 07r(i) to the variable 
node di, which involves upward messages to from its other children, i.e. the siblings of Oi, marked 
as 6k, k G sib(i). This downward message also requires the message sent to from its parent node, 
which is the grandparent of di, denoted by dgpU). The downward messages have the following general 
form [16]: 

m^(i)^i(gi) = a max 

7r( 

where a > denotes a normalizing constant used for computational stability. For the variable nodes i 
in the second decomposition level that have no grandparents (i.e., 7r(i) G T^oot). we set the multiplicative 
term mgp(i)^7r(i)(g7r(i)) in (24) to one. 

In Appendix B-II, we show that the only two candidates for in the maximization of (24) are 
[0, S7r(i)(0)]^ and [1, S7r(i)(l)]^> see also (16). Substituting these candidates into (24) and normalizing the 
messages yields (see Appendix B-II) 

m.(,)_,(g,) = [i^fm'-'' [/^'(l)]'^ (25a) 
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(a) 

Fig. 3. Computing and passing (a) upward and (b) downward messages. 



e sib(0 



(b) 



for Ti{i) e r\7reaf, where [i^f{0),fif{l)f = fif and 

[max{i/^, rjf}, max{iyf - r]f}f 



max{i/o • ryf } + max{j/f • ryf } 

[exp(lnmax{i/Q j r/f} — \nmax{i^f ,- '/7f}), l] 
1 + exp(lnmax{i/Q j rjf} - \nmax{uf ^ r]f}) 

A;gsib(i) 



T 



fcGsib(i) 

root) -Proot] ) ""(0 ^ Troot 



M7r(i)' 



(25b) 
(25c) 

(25d) 
(25e) 



7r(i) e (T\7;oot)\7rcaf 

A numerically stable implementation of (25b) that we employ is illustrated in the second expression in 
(25b). 

The above upward and downward messages have discrete representations, which is practically important 
and is a consequence of the fact that we use a Gaussian prior on the signal coefficients, see (4). Indeed, in 
contrast with the existing message passing algorithms for compressive sampling [5]-[8], our max-product 
scheme employs exact messages. 
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3) Maximizing Beliefs: Upon computing and passing all the upward and downward messages, we 
maximize the beliefs, which have the following general form [16]: 

(Qi) Yi "^k-^Mi) (26) 
feech(j) 

for each i e T, where a > is a normalizing constant. [In (26), we set mT^{i)^i{qi) = 1 if i e T^^oot and 
nifcech(i) '^k-^ii^i) = 1 if « e Tieaf-] We then use these beliefs to obtain the mode 

Or = argrnaxp0^|,2_^(07-| (7^ z) (27) 
Or 

where the elements of Oj- are [see (16)] 

?.(g,)]^ = argm^ax6(0,)=l itJlr' > A(0) -^^ ^^Sa) 

I [0, Sj(0)J , otherwise 

and 

/3, = [A(0), = I - f--]I ® t^^^) ® . ^ %[-^ . (28b) 

Here, ai > is a normalizing constant. The detailed derivation for the forms of di and /3j in (28) is 
provided in Appendix B-in. 
In summary, 

= argmaxpe|a2,2(^' I (29) 

where = [el sl - olV and 



e,^% mi)Y={ [1, A(i) > A(o), i e r (30) 

[0, s,(o)]^, A(i)<A(0), ier 



follows by combining (19) and (28a) and we have omitted the dependence of on 2; and 6i on Zi to 
simplify the notation. 

IV. Selecting ct^ 

We can integrate out, yielding the marginal posterior of in (8b), and derive an 'outer' EM iteration 
for maximizing pQ^y{d\y): 

(i) Fix (7^ and apply the EM iteration proposed in Section HI to obtain an estimate d^'^'^\a'^) of d; 

in) Fix d to the value obtained in (i) and estimate cr^ as 

= \\V-Hs\\l^s-D-Kc)s 

^ ' p + N 

Even though it guarantees monotonic increase of the marginal posterior p0\y{d\y), the 'outer' EM 
iteration (i)-(ii) does not work well in practice because it gets stuck in an undesirable local maximum of 
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(7 



d 



Fig. 4. Grid search in selecting a 



Pe\y{6\y). To find a better (generally local) maximum of pe\y{6\y), we apply a grid search over cr^ as 
follows. 

We apply the EM algorithm in Section III using a range of values of the regularization parameter cr^. 
We traverse the grid of K values of sequentially and use the signal estimate from the previous grid 
point to initialize the signal estimation at the current grid point: in particular, we move from a larger cr^ 
(say cTqIjj) to the next smaller cr^p^(< cr^j^) and use s^'^°°\a'^i^) (obtained upon convergence of the EM 
iteration in Section III for cr^ = a^^^) to initialize the EM iteration at cr^^^. The largest on the grid and 
the initial signal estimate at this grid point are selected as 

^MAx = ^|^, ^(°H^MAx) = 02PX1. (32a) 
The consecutive grid points a^^^ and cr^i^ satisfy 

2 

f^new = — (32b) 

where (i > 1 is a constant determining the search resolution. Finally, we select the cr^ from the above 
grid of candidates that yields the largest marginal posterior distribution (8b): 

al = arg max Pe\y{0^^^\cr') \ y) (33) 

e{<AX'<Ax/'^'---'<Ax/'^ } 

and the final estimates of 6 and s as d''^^\a'l) and s*^+°°)(cr^), respectively, see Fig. 4. 

V. Numerical Examples 
We compare the reconstruction performances of the following methods: 



13 

• our proposed max-product EM algorithm in Section EI with the variance parameter cr^ selected using 
the marginal-posterior based criterion in Section IV (labeled MP-EM), search resolution d — 2, and 
MATLAB implementations available at http://home.eng.iastate.edu/~ald/MPEM.html; 

• our max-product EM algorithm in Section III with cr^ tuned manually for good performance (labeled 
MP-EMopt) with d = 2; 

• the turbo-AMP approach [5] with a MATLAB implementation at http://www.ece.osu.edu/~schniter/ 

turboAMPimaging and the tuning parameters chosen as the default values in this implementation; 

• the fixed-point continuation active set algorithm [18] (labeled FPCas) that aims at minimizing the 
Lagrangian cost function 

0.5||y-//s||^ + r||s||i (34a) 
with the regularization parameter r computed as 

T = 10"||i/^2/|U (34b) 

where a is a tuning parameter chosen manually to achieve good reconstruction performance; 

• the Barzilai-Borwein version of the gradient-projection for sparse reconstruction method with debi- 
asing in [19, Sec. III.B] (labeled GPSR) with the convergence threshold tolP = 10~^ and tuning 
parameter a in (34b) chosen manually to achieve good reconstruction performance; 

• the double overrelaxation (DORE) thresholding method in [11, Sec. Ill] or its approximation (DOREapp) 
where the {H H'^)'^ term is approximated by a diagonal matrix, initialized by the zero sparse signal 
estimate: 

= Opxi; (35) 

• the normalized iterative hard thresholding (NIHT) scheme [20] initialized by the zero s^^^ in (35); 

• the model-based iterative hard thresholding (MB-IHT) algorithmn [4] using a greedy tree approxi- 
mation [21], initialized by the zero s'^'^^ in (35). 

For the MP-EM, DORE, NIHT, and MB-IHT iterations, we use the following convergence criterion: 



O'+i) _ «(i)||2 

< 5 (36) 



p 

where 5 > is the convergence threshold selected in the following examples so that the performances of 
the above methods do not change significantly by further decreasing 5. 
The sensing matrix H has the following structure: 

H = —^^) (37) 
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where $ is the N x p sampUng matrix and ^ is the p x p orthogonal sparsifying transform matrix 
(satisfying ^ = Ip). Note that H in (37) satisfies (2). In the following examples, the sensing matrices 
$ are either random Gaussian (Sections V-A and V-B) or structurally random [22] (Section V-C) and 
the sparsifying transform matrices ^ are either identity (Section V-A) or inverse Haar wavelet transform 
matrices (Sections V-B and V-C). We set the tree depth L — A. 

A. Small-scale Structured Sparse Signal Reconstruction 

We generated the binary state variables q of length p — 1024 using the Markov tree model in Section II with 
Pl = 10~^. Conditional on qi, Si are generated according to (4b). Here, the matrix-to-vector conversion 
operator v{-) corresponds to simple columnwise conversion. The entries of the sampling matrix $ in 
(37) are independent, identically distributed (i.i.d.) standard Gaussian random variables and the transform 
matrix ^ in (37) is identity: ^ = Ip. 

We vary the values of 7^, e^, a'^, P^i, and Proot to test the performances of various methods under 
different conditions. Our performance metric is the average mean-square error (MSE) of an estimate s 
of the signal coefficient vector: 

MSE{S} = - ^ll^l (38) 

p 

computed using 500 Monte Carlo trials, where averaging is performed over the random Gaussian sampling 
matrices $, signal s, and measurements y. The expected number of large-magnitude signal coefficients is 

E[i:*]=|;(l+3i:4'/'0 (39a) 

i=l 1=0 

where Pi is the marginal probability that a state variable in the Ith tree level is equal to one, computed 
recursively as follows: 

Pl = Pi.,Pu + (1 - Pi-i)Pl (39b) 

initialized by Pq = Proot- 

NIHT, DORE, and MB-IHT require knowledge of the signal sparsity level r; in this example, we set 

r for these methods to the true signal support size. For cr^ = 1, we select the convergence threshold in 
(36) to 5 = 10""^ and for cr^ = 10~^, we select this convergence threshold to 5 = 10~^°. For GPSR and 
FPCas> we vary a within the set {—1, —2, —3, —4, —5, —6, —7, —8, —9} and, for each N/p and each 
of the two methods, we use the optimal a that achieves the smallest MSE. For MP-EM, we set the grid 

length K = 16. 

Recall that the turbo-AMP approach needs normalized columns of the sensing matrix, see [5, eq. (22)]. 
When applying the turbo-AMP method, we scale the sensing matrix as i^scaie — (1/V^) so that it 
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has approximately normalized columns. With measurements y and scaled sensing matrix -ffscaie> turbo- 
AMP returns the scaled signal estimate Sscaie, and we compute the final turbo-AMP signal estimate as 
(p$/V^) Sscaie> whose performance is evaluated using (38). 

Figs. 5 and 6 show the MSEs of different methods for several choices of 7^, e^, and cr^ where we fix 
^'h = ^root = 0.5 (corresponding to E [JJ^^-^ qi]/p = 0.0918) and consider (j^ e {1, lO"^}, e {0.1, 10}, 
and 7^ e {10^, 10^}. Here, a larger value of the low-signal relative variance implies that the signal 
coefficient vector s is less (approximately) sparse and a larger value of the high-signal relative variance 7^ 
implies a relatively higher signal-to-noise (SNR). Observe that the noise variance cr^ = 10~^ corresponds 
to the noise precision Ija^ — 10^, which is the mean of the prior pdf for l/cr^ used in [5, Sec. IV, 
p. 3444]. 

In Fig. 5, we show the MSEs of various methods as functions of the subsampling factor N j-p for more 
sparse signals (e^ = 0.1), relatively lower SNR (7^ = 10^), and variable noise variance cr^ e {1, 10~^}. 
Observe that turbo-AMP is sensitive to the choice of the noise variance a^: It has the largest MSB for 
— \ and Njp < 0.4, but becomes the second best method for cr^ = 10~^ and most N/p. In contrast, 
MP-EM keeps the best reconstruction performance as varies: The MSE of MP-EM is up to 4.6 times 
smaller than its closest competitor for both cr^ = 1 and cr^ = 10~^. 

The MSEs of most methods are roughly 10^ times smaller in Fig. 5(b) where cr^ = 10~^ than the 
corresponding MSEs in Fig. 5(a) where = 1. However, this is not true for turbo-AMP, which is 
very sensitive to the selection of its prior pdf for the noise precision l/u^. For the noise variance = 
10~^, turbo-AMP performs significantly better than for cr^ = 1 (upon taking into account the scaling 
adjustment by the factor 10~^), which is facilitated by the fact that l/cr^ — 10^ is the mean of the 
prior pdf for l/cr^ used in [5, Sec. IV, p. 3444] and in the corresponding MATLAB implementation at 
http://www.ece.osu.edu/~schniter/turboAMPimaging that we employ. 

The approximate invariance of MP-EM to scaling of the measurements can be explained by the fact that 
the shape of the concentrated marginal posterior distribution (10) (which is a function of state variables 
q only) does not change as we scale the measurements y hy a constant. 

In Fig. 6, we fix cr^ = 10"^^, focus on less (approximately) sparse signals with = 10, and show the 
MSEs of various methods as functions of the subsampling factor N/p for 7^ = 10^ (relatively higher 
SNR) and 72 = 10^ (lower SNRs). When 7^ = 10^, turbo-AMP and MP-EM clearly outperform all other 
methods: turbo-AMP has the smallest MSE for N/p < 0.3. The MSE of turbo-AMP is larger than that of 
MP-EM when N/p > 0.3. When 7^ = 10^, MP-EM outperforms all the other methods except MP-EMqpt 
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Fig. 6. MSEs as functions of the subsampUng factor N/p for Ph 
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Proot = 0.5, 7^ = 10^ = 0.1 and (a) cr^ = 1 and (b) cr^ = 10"®. 
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10 and (a) 7^ = 10^ and (b) 7^ = 10^ 



= Proot = 0.5, (7^ = 10-^ = 



for all the subsampling factors. 

Parts (b) of Figs. 5 and 6 show the MSB performances of various methods for reconstructing signals 
that are more and less (approximately) sparse, respectively, with all other simulation parameters being the 
same. For each method, the more sparse signals can be reconstructed with a smaller MSE than the less 
sparse signals at each subsampling factor N/p: Compare Figs. 5(b) and 6(b). 

In both Figs. 5 and 6, the MSE of MP-EM is close to that of MP-EMopx, which implies that the 
marginal-posterior based criterion in Section IV selects the variance parameter well in this example. 

Both MP-EM and turbo-AMP yield generally non-sparse signal estimates, particularly when the under- 
lying signal s is less (approximately) sparse, i.e., = 10. 
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Fig. 7. MSEs as functions of the expected significant coefficient ratio E [X^^Lj^ QiJ/p for ""^ ~ 10 ^, 7^ = 10'', N j-p — 0.35 and (a) 
= 0.1 and (b) = 10. 



Fig. 7 shows the MSEs of different methods as functions of the normalized expected number of large- 
magnitude signal coefficients E [^^^-,^ (corresponding to the expected significant coefficient ratio), 
obtained by varying Ph = -Proot, where we fix = 10^^, 7^ = 10^, N j-p = 0.35 and consider G 
{0.1, 10}. MP-EMqpt has the smallest MSE for all expected significant coefficient ratios in Fig. 7. MP-EM 
provides a relatively poor performance compared with other methods when E [XliLi Qi] small, implying 
that the marginal-posterior based criterion in Section IV does not select the variance parameter cr^ well 
for very small expected significant coefficient ratios and that manual tuning of is needed in this case. 

For more (approximately) sparse signals with = 0.1 in Fig. 7(a), MP-EM outperforms all other 
methods except MP-EMqpt when E [Y^^=i Qil/p > 0.0655. For less sparse signals with = 10 in Fig. 7(b), 
MP-EM becomes the closest competitor to MP-EMopt for E [XliLi Qil/p ^ 0.0473. For both more and less 
sparse signals, the gap between the MSEs of MP-EM and MP-EMqpt becomes smaller as E [X^iLi 
increases. Turbo-AMP is the second best method when 'E [J2^=i Qi] / P < 0.0655 and E[^^^^gj]/p < 
0.0473 for = 0.1 and = 10, respectively. However, it achieves a relatively fair performance for larger 

For more (approximately) sparse signals with = 0.1 in Fig. 7(a), the convex approaches (GPSR 
and FPCas) outperform the hard thresholding methods (DORE, MB-IHT, NIHT) when E [Yl^.^i qi\/p > 
0.0655. For less sparse signals with = 10 in Fig. 7(b), the convex approaches outperform the hard 
thresholding methods over the entire range of expected significant coefficient ratios. With the exception 
of MP-EM and MP-EMqpt, GPSR and FPCas have smaller MSEs than all the other methods in Fig. 7(a) 
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When EELi?«]/P> 0.104. 

MB-IHT, which employs a greedy tree approximation and deterministic tree structure, achieves quite 
a poor MSE performance in Figs. 5-7. A relatively poor performance of MB-CoSaMP (which employs 
the same deterministic tree structure) has also been reported in [5, Sec. IV.B]. 

B. Image Reconstruction Using Gaussian I.I.D. Sampling Matrices 

We reconstruct the 128 x 128 'Cameraman' image from compressive samples generated using random 
sampling matrices $ with i.i.d. standard normal elements and the p x p orthogonal inverse Haar wavelet 
transform matrix ^. Here, the matrix-to-vector conversion operator v{-) is based on the MATLAB wavelet 
decomposition function wave dec 2 with Haar wavelet, which has also been used in [3] and [5]. Our 
performance metric is the average MSE of a signal coefficient vector estimate s: 

MSE{i} = (40) 
P 

computed using 10 Monte Carlo trials, where averaging is performed over the random Gaussian sampling 
matrices $. 

Here, we employ DOREapp that approximates the {H H'^)~^ — p\ ($$^)~^ term by {p%/p) In, which 
is justified by the fact that E $[$$^] — pIn holds in this example, see also (37). For DOREapp, we apply 
the following empirical Bayesian estimate of random signal vector z [11, eq. (16)]: 

^(+oo) ^ g(+oo) ^ H^[HH^)-\y - Hs^+^^) (41) 

where s^~^°°^ denotes the sparse signal estimates obtained upon convergence of DOREapp iteration and 
the [H H^)~^ term is approximated by {p%/p) In- We set the sparsity level r for NIHT and DOREapp 
as 2000 N/p and 2500 N/p for MB-IHT, tuned for good MSE performance. The convergence threshold 
in (36) is set to 5 = 10~^. The grid length in MP-EM is set as X = 12 and the tuning parameters for 
MP-EM are chosen as 

7^ = 1000, e' = 0.1, P,„„t = /'h = 0.2, Pl = 10"^ (42) 

For GPSR and FPCas> we tuned the regularization parameter r manually by varying a with the set 
{— 1, —2, —3, —4, —5, —6, —7, —8, —9} : the best reconstruction performances are achieved for a = —3. 
When applying the turbo- AMP method, we scale the sensing matrix as i^scaie — apply 
the same scaling correction as in the example in Section V-A. 

Fig. 8 shows the MSE performances of different algorithms as functions of the normalized number of 
measurements (subsampling factor) N/p. MP-EM achieves the best MSE when N/p < 0.35. The MSEs 
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Fig. 8. MSEs as functions of the subsampling factor N/p. 



of GPSR and FPCas are close to each other and smaller than those of DOREapp, NIHT, and MB-IHT for 
all N/p and the MSE of MP-EM is 1.4 to 2.4 times smaller than that of GPSR and FPCas, see Fig. 8. 

MB-IHT has the largest MSE for most N/p, which is likely due to the fact that it employs the 
deterministic tree structure, as discussed earlier. 

For N/p < 0.35, turbo-AMP performs similarly to DOREapp, NIHT, and MB-IHT, but it outperforms all 
other methods for N/p > 0.35. The reasons why turbo-AMP performs well for large N/p, outperforming 
all competitors, are likely the foUowings: 

• it uses a more general prior on the binary state variables than our MP-EM method, which allows the 
tree probability parameters Pn, Pl, 7^, and to vary between the signal decomposition levels, and 

• learns the tree probability parameters parameters from the measurements. 

In contrast, our MP-EM method employs the crude choices of the tree and other tuning parameters in (42). 

C. Large-scale Image Reconstruction Using a Structurally Random Sampling Matrix 

We now reconstruct the standard 256x256 'Lena' and 'Cameraman' images. As in Section V-B, the matrix- 
to-vector conversion operator v{-) is based on the MATLAB wavelet decomposition function wavedec2 
with Haar wavelet. The sampling matrix $ is generated from structurally random compressive samples [22] 
and the transform matrix \E' in (37) is the p x p orthogonal inverse Haar wavelet transform matrix, which 
implies that the sensing matrix H has orthonormal rows: H = 1^ and, consequently, = ph = 1. 
Our performance metric is the peak signal-to-noise ratio (PSNR) of an estimated signal s: 

PSNR (dB) = 10 logio {i^^%^4l7^^|- (^^3) 



20 



Here, we employ the exact DORE and the exact random signal estimate in (41), which are computa- 
tionally tractable because H has orthonormal rows. We set the sparsity level r for NIHT and DORE as 
10000 N/p and 15000 N/p for MB-IHT, tuned for good PSNR performance. The convergence threshold 
in (36) is set to 5 = 0.1. The tuning parameters for MP-EM are given in (42) and the grid length in 
MP-EM is set as X = 12, the same as in Section V-B. We tuned the regularization parameters r in (34b) 
for FPCas and GPSR manually and found that the best performance is achieved when a = —3 for both 
algorithms. 

When applying the turbo- AMP method, we scale the sensing matrix as i^scale — {s/p/N) With 
measurements y and scaled sensing matrix i?scaie> turbo-AMP returns the scaled signal estimate Sscaio 
and we compute the final turbo-AMP signal estimate as (^/p/N) Sscaio whose performance is evaluated 
using (43). Our empirical experience shows that scaling the sensing matrix improves the reconstruction 
performance of the turbo-AMP algorithm in this example. 

Fig. 9 shows the PSNRs and CPU times achieved by various methods when reconstructing the 256 x 256 
'Lena' image. For N/p < 0.4, the proposed MP-EM method outperforms all other methods, where the 
performance improvement compared with the closest competitor varies between 2.4 dB and 2.6 dB. For 
N/p > 0.4, turbo-AMP outperforms all other methods. In terms of CPU time, DORE and NIHT are the 
fastest among all the methods compared. It takes around 7 seconds as the runtime for turbo-AMP at each 
measurement point. MP-EM is 1.5 to 2.3 times slower than turbo-AMP, but obviously faster than GPSR, 
FPCas, and MB-IHT. ^ 

Fig. 10 shows the PSNRs and CPU times achieved by various methods when reconstructing the 256 x 256 
'Cameraman' image. For N/p < 0.4, the proposed MP-EM method outperforms all other methods by at 
least 2.6 dB. For N/p > 0.4, turbo-AMP outperforms all other methods, but performs quite poorly for 
N/p < 0.35: a similar pattern that occurs also in Fig. 9. According to Fig. 10(b), both DORE and NIHT 
consume less than 4 s in terms of CPU time. It takes around 7 s for turbo-AMP at every measurement 
point. MP-EM is still consistently faster than GPSR, FPCas, and MB-IHT, and requires 4.0 to 10.8 s 
more than turbo-AMP. 

In Figs. 9 and 10, MB-IHT achieves a fair performance and consumes the longest CPU time. 

Figs. 11 and 12 show the reconstructed 256 x 256 'Lena' and 'Cameraman' images by different 
methods for N/p = 0.375, respectively: The MP-EM algorithm achieves better reconstructed image quality 
compared with the other methods. 

'Regarding the reported CPU time, note that the turbo-AMP code does not use MATLAB only, but combines MATLAB and JAVA codes. 
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Fig. 9. (a) PSNRs and (b) CPU times as functions of the subsampling factor N/p for the 256 x 256 'Lena' image. 
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Fig. 10. (a) PSNRs and (b) CPU times as functions of the subsampling factor N/p for the 256 x 256 'Cameraman' image. 



VI. Concluding Remarks 

We presented a Bayesian EM algorithm for reconstructing approximately sparse signal from compressive 
samples using a Markov tree prior for the signal coefficients. We employed the max-product belief 
propagation algorithm to implement the M step of the proposed EM iteration. Compared with the existing 
message passing algorithms in the compressive sampling area, our method does not approximate the 
message form. The simulation results show that our algorithm often outperforms existing algorithms for 
simulated signals and standard test images with different sampling operators. 

Our future work will include the convergence analysis of the MP-EM algorithm, incorporating other 
measurement models, using a more general prior on the binary state variables, and designing schemes for 
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learning the tree parameters from the measurements. 

Appendix 
Appendix A 

Derivation of the EM Algorithm and Proofs of Its Monotonicity and (15) 
Consider the hierarchical two-stage model in (1 1). The complete-data posterior distribution for known is 

y/det[C(cT2)] 

• exp[-0.5 \\z - s\\l/a'^ - 0.5 D'^q) s/a^] (Ala) 

where 

C{a^) = a\lN - HH^) (Alb) 

and 

Pz\a^,y,0{^\o-'^^ 2/' ^) = Pz|a2,y,s(2|(7^ y, s) = 7V(z|E ^|^2_y_s(z|a^ J/, s), COV^|<,2,y,^(z|a^ t/, s)) 

(Ale) 

where 

E.|.^^,.(z|a^?/,s) = + Va^j-H^^^lC^l^'jl-'y + sM (Aid) 

cov,|,2,^,,(z|a2, y, s) = {//^[^(a^)]-^// + I^/a'')-^ (Ale) 

By using the matrix inversion lemma [23, eq. (2.22), p. 424]: 

{R + STU)-^ = - R-^S{T-^ + UR-^S)-^UR-^ (A2a) 

and the following identity [23, p. 425]: 

{R + STUy^ST = R-^S{T-^ + UR-^S)-^ (A2b) 

we obtain 

E,\,2^y^,{z\a^,y,s) = s + H^{y-Hs) (A3) 

which leads to (12). 

The objective function Inp^ | a'^^y{6 \ cr^, y) that we aim to maximize in Section EI satisfies the following 
property in the EM iteration: 

]npei,2je\a^y) = Q{d\e^^^) - n{d\d^^^) (A4a) 

where 

Q(6>|6>(^')) ^ E,|.2,^,0 [\npg^,\„2je,z\a',y)\a',y,6^'^] (A4b) 
^ E,|,2,y,« [lnp,|,2,^,«(z|(7^2/,0)|(7^y,0(^■)] (A4c) 
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From (Ala) and (A3), Q{e\e^^^) could be computed as 

Q(6»|6»(^)) = const + E,|,2,y,e { - 0.5(2/ - Hzf[C{a^)]-\y - Hz) - 0.5 \\z - s\\l/a^ 

-0.5 s^D-\q) s/a' + ln[p,(g)] + 0.5 ln(6V7') E 1'^ \ ^^'^} 

i=l 

= const - 0.5 ^ ^Il2 + ^ ^ ^ 1^^^^^)] ^ ^ (A5^ 

1=1 

where const denotes the terms that are not functions of d and (13a) follows. Since Q{0\O^^^) is maximized 
at 0^^'^^\ we have 

Q(0ij+^)\0(j)) > Q(e^^)\e(^)) (A6) 

and (14) follows from (A4a) by using the inequalities (A6) and 

< U[eii)\eii)) (A7) 

where (A7) is a consequence of the fact that 'H{d\d^^^) is maximized with respect to at = 0^^\ 

Proof of (15).- For a given q, (A5) is a quadratic function of s that is easy to maximize with respect 

to s: 

argmax Q{e\e^^^) = [D-^{q) + /J z^^^. (A8) 

Therefore, the estimates of s and qr obtained upon convergence of the EM iteration in Section in to its 
fixed point satisfy: 

^(+00) ^ [£,-l(g(+oo)) + jJ-l_^(+oo) 

= + Ip] + H^{y~H s(+°°))] (A9) 

where the second equality follows by using (12). Solving (A9) for s^~^°°^ yields 

and (15) follows. □ 

Appendix B 

Derivation of the Messages and Beliefs in Section III-A 
Before we proceed, note the following useful identities: 

argmsixj\f{zi; Si, a'^) J\f{si; 0,r^) = „ , % (Bla) 
Si a + T 

maxA^(z, ; s,, a^) N{si ; 0, r^) = } exp ( - 0.5 • (Bib) 
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I Upward Messages 

1) Upward Messages from Leaf Nodes: When passing upward messages from the leaf nodes i e TTeaf. 
we set the multipHcative term Y[kech{i) ''^k^ii^i) yielding [see (22)] 

mi_,7r(i)(?7r(j)) = a max{V'j(0i) V'i,7r(i)(?j,?7r(i))} 

= a max {AA(z, ; s,, a^) [Misi ; 0, 7^')]^* [Afisi ; 0, eV')]^-«^ 

■[P^^ (1 - Ph)^"^T"''^ [Pl (1 - Pl)^"^^ "^"W}. (B2) 

For = 0, we have 

m,^,(,)(0) = //^(O) = a max {jV(^, ; s,, a') [J\f{si ; 0, tV^)]''' [J\f{s, ; 0, eV^)]^-''' P^'' (1 - ^'l)'"*} 

2 2 

- ai max 1(1 - Pl) exp ( - 0.5 „ „ J /e, Pl exp ( - 0.5 — (B3a) 

and, for = 1, we have 

m,^^(,)(l) = /^^(l) = a max {X{z, ; 5,, a^) [Ar(s, ; 0, 7^^)]'?' [Ar(s, ; 0, e^a')]'-^^ (1 - Pn)'-'^^} 

= a, max {(1 - Ph) exp ( - 0.5 ^^^) /e, exp ( - 0.5 ^^^^) h} (B3b) 

where we have used (Bib) with — cr^e^ and = (7^7^ and a > and cti > are appropriate 
normalizing constants. It follows from (Bla) that the only two candidates for di in the maximization of 
(B2) are [0,s,(0)]^ and [l,s,(l)]^. 
In summary, 

m.^,(,)(g,(i)) = KlOjl'-''-^ K(l)]''^« (B4a) 
and (B3a) and (B3b) can be rewritten as 

= max{i/S,J/(max{i/S,.} + max{iy^_J) (B4b) 
//^(l) = max{i/^ .}/(max{i/S .} + max{i/^_.}) (B4c) 

and j, i^i" j, and (^(2;) were defined in (23c), (23d), and (23f). 

2) Upward Messages from Non-Leaf Nodes: For i e T\T\eai, we can use induction to simplify the 
multiplicative term nitech(i) "^fe->i(?i) ™^ (22) as follows: 

n = [ n /^^(o)]'"'' [ n /^^(i)]'' ^^s) 

feech(i) A;ech(i) fcech(i) 

see also Fig. 3(a). 

Substituting (B5) into (22) yields 

m^^(i){q7Tii)) = a max {i>i{0i) ipi,n(i){qi: q^{{)) JJ mk-^i{qi)} 

kech{i) 

= a max ; s,, a') ; 0, fa^)]^^ ; 0, e'a')]'"^^ [P^^ (1 - Ph)^-^^^^ 

.[p.,^l_p^)i-..]i-..«[ -Q ;,u(o)]i-^^[ J] f^lilT). (B6) 

fcech(i) fcech(i) 
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For = 0, we have 

m,^,(,)(0) =amax{Ar(^,; s,, a^) [^{3,; 0, 7V)]^MAr(s, ; 0, eV)]i-«^ (1 - Pl)'"* 

fcGch(i) A;Gch(i) 



2 

= aimax{(l-PL)[ J] /^^(O)] exp ( - 0.5 ^^^^^J /e, 

A;6ch(i) 

[ n /^^(l)] ( - ^2+L2 )/7} (B7a) 

and, for q'^(j) = 1, we have 

m,^^(,)(l) = a max {j\f{zi ; s^, a^) [Af{si ; 0, 7^^)]"* ; 0, eV^)]^-^* P^^ (1 - Ph)'"^^ 

•[ n /^^(o)]'"'M n 

A;Gch(i) A;Gch(i) 

= Q;imax{(l-PH)[ J] /^fcW] exp ( - 0.5 ^^^^)/e, 

A;ech(i) 

[ n /^^(l)] ( - ^2+L2 )/7} (B7b) 

where we have used (Bib) with = cr^e^ and = (7^7^ and a > and ai > are appropriate 
normaUzing constants. 



In summary. 



= k"(0)]'-''-« (B8a) 



where 



/.^(O) = ms^xii^l^ r7a/(max{r/S, © V-} + max{i/^, r/^}) (B8b) 

fxfil) = max{i/^^, rjf}/{max{ul, + max{i/^^, t]^}) (B8c) 

and 

r?^ = O K- (B8d) 

feech(i) 

The general upward message form in (23) follows by combining (B4) and (B8). 
// Downward Messages 

Based on the results in Section III-Al and Appendix B-I, we simplify the product of upward messages 
sent from the siblings of node i in (24) as follows [see (23a)]: 

n m,^.(,)(?.(,)) = [ n /i^(o)]i-^^« [ n (B9) 

fcGsib(i) fe6sib(i) fe6sib(i) 

see also Fig. 3(b). 
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1) Downward Messages from Root Nodes: For the node 7r(i) e 7^oot> we set the message 
i)(?7r(i)) to one, yielding [see (24)] 

"^,r(iHi(?i) = «^ax JJ mfc^^(j)(g^(i))|. (BIO) 

Substituting (B9) into (BIO) yields 

{qi) = amax < '?/'7r(i)(^7r(j)) i'i,TT{{){Qi, Q-K(i)) II mk^v{i){(lTT{i)) \ 

feGsib(i) 

= a max ; a^) [P,ootM{s^(i) ; 0, 7V2)]^-(^) [(1 - Proot)M{s^ii) ; 0, eV^)]!-^-^ 

fc6sib(i) fe£sib(i) 

For Qi — 0, we have 

m^(i)^i(0) = a max {M{z^(^^ ; a^) [A/'(s^(i) ; 0, 7V2)]«-« [A/'(s^(i) ; 0, eV^)]^-^-^ 



fcesib(i) k&\h{i) 

v2 



= Q;imax{(l-P,ooO(l-^L)[ J] A^fc(O)] exp ( - 0.5 ^l^^^ )/^^ 

Proot(l-PH)[ n /^^(l)]e^p(-0-5^^f^)/7} (B12a) 



fcGsib(j) 

and for gj = 1, we have 

m^(i)^i(l) = a max ; ^^(i),^^) [7V(s^(i) ; 0,7V2)]'^-« [7V(s^(i) ; 0,eV2)]^-^"« 

•{(1 - Proot) Pl [ n /^^Wll'-^^^i^'rootPHl H /^^(l)])"^^'^ } 
A;£sib(i) fe€sib(j) 

= aimax|(l-Proot)PL[ H /^fcW] exp ( - 0.5 + 

fe£sib(i) 

i^rootPnl n /^fc(l)]^^p(-0-5^^f^)/7} (B12b) 

A;£sib(i) 

where we have used (Bib) with = cr^e^ and = (7^7^ and o; > and ai > are appropriate 
normalizing constants. The only two candidates to maximize (BIO) are [0, S7r(i)(0)]^ and [1, S7r(i)(l)]^- 
In summary, 

= [^im'~'' [l4i'^)]'' (B13a) 

where 

/.f (0) = max{< r;f}/(max{< r;f} + max{<, r;f }) (B13b) 
= max{i/i, r7f}/(max{i/^,, ryf} + max{i/i, ryf }) (B13c) 
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and 

< =[l-Pu l-Pnf® ct>{z^ii)) Q[Q y^l] (B13d) 

fe6sib(i) 

= [ Pl, Ph ]^ 4>M ®[Q l4\ (B13e) 

fe6sib(i) 

r/f = [1-Proot, Proot]^. (B13f) 

2j Downward Messages from Non-Root Nodes: For the node 7r(i) e (T\7^oot)\'7reaf> using the same 
strategy as above, (24) simplifies as 

^^{ij-^iiqi) = a max 7r(«)(^7r(j)) i^i,Tvii){Qi, QTv{i)) "T'gp(i)->7r(i)(?7r(j)) JJ^ "7,A;-^^(j) (?7r(i)) | 

fc€sib(i) 

= a max ; a^) ; 0, 'y'a')Y-(^^ [Af{s^ii) ] 0, eV^jj^-'^-w 

fcGsib(i) fe6sib(i) 

•[<i)(0)]^-^-«[<.)(l)]^-«} (B14) 
For Qi = 0, we have 

m^(i)^i(0) = a max {^/'(^^(i) ; cr^) ; 0, -f'^a'^)^^') [^f{s^{^) ; 0, eV2)]^-«-« 

•{<.)(o)(i-pl)[ n /^^(o)]}'"'^'"' (1 - ^h)[ n /^^(i)]^'''} 



feGsib(i) feGsib(i) 

v2 



= aimax{/.;J(,)(0)(l-PL)[ J] /^^W] exp ( - 0.5 -^^^)/6, 

fcGsib(i) 

(1-Ph)[ n /^^(l)]exp(-0.5 ^^^-^^^ )/7} (B15a) 



feesib(i) 

and for gj = 1, we have 



fcGsib(i) fcGsib(j) 

= max {/.^(,)(0) Pl [ H /^fe(0)] ^xp ( - 0.5 -^^|^)/e, 

feesib(i) 

f^U){^) Ph [ n /^^(l)] ( - ^2^^)/7} (B15b) 

A;6sib(i) 

where we have used (Bib) with = cr^e^ and = 0-^7^ and a > and ai > are appropriate 



normalizing constants. The only two candidates to maximize (B14) are [0, S7r(i)(0)]^ and [1, S7r(i)(l)]^. 
In summary, 

m.(,)^,(g,) = [l^mV-'' (B16a) 
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where 

/xf (0) = max{< 77f }/(max{< rjf} + mBx{ul, r7f }) (B16b) 
/xf (1) = max{ul, r;f }/(max{<, ijf} + max{ul, r;f }) (B16c) 

and 

rit = fiU) (B16d) 
The general downward message form in (25) follows by combining (B13) and (B16). 

/// Beliefs 

Define the vector f3i = [/3i(0), Pi{l)f as 

A(0) = max6([0, Sif), = max6([l, Sif) (B17) 

where b{6i) are the beliefs defined in (26). 

i) Beliefs for the Root Nodes: For root nodes i G T^oot, the beliefs b{6i) in (26) become 

b{0,) = aM{z, ■ s,, a') [ProotAf{s, ; 0, j^a^)]"^ [(1 - P^oot) ^^{si ; 0, e'a')]'-'^' 

■ [ n /^^(o)]'"''[ n -"^(i)]''- (^18) 

fcech(i) fcech(i) 

and (B17) simplify to 

fcech(i) 



A(l) = a^= = exp ( - 0.5 - ) P,oot n /^^W (^19b) 

yielding 

/3, = [A(0), = - Proot, Proot]^ • (B20) 

2) Beliefs for the Non-Root Non-Leaf Nodes: For i E (T\ T^^oot) \ Ticaf, the beliefs b{9i) in (26) become 
b{0,) = aAf{zr, s.,a') Msr, 0,7V)]^^ ^3,; 0,6V)]^-^^ K^W]^"^^ K^(l)]^^ 

• [ n /^^(o)]""[ n /^^(i)]" (B21) 

fcGch(i) fcech(i) 

and (B17) simplify to 

m = « y:^^/:^^ exp ( - 0.5 -^^) /^f (0) J] /^^(O) (B22a) 



^^^^^ ^ " V2..V2^7V^ ^ " '^^'^ n /^^(l) (B22b) 

yielding 

A = [A(0), = Mf r;^- (B23) 
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3) Beliefs for the Leaf Nodes: For i e T\eai, the beliefs h{di) in (26) become 

(B24) 

and (B17) simplify to 



yielding 



In summary. 



Consequently, the mode di is computed as 

e, = (?,.?.(?.)) = argmax6(e.) = { ^'^^^ ■ (B27) 

Note that the normalizing constants a and ai in the above upward and downward messages and beliefs 
have been set so that mi_^^(j)(0) + mi_,^(j)(l) = 1, m^^i^^i{0) + m7r(i)-^i(l) = 1, and ^^(0) + ^^(1) = 1 
respectively. 
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